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In many practical situations exploratory plots are helpful in understanding tail behavior of sample data. The Mean 
Excess plot is often applied in practice to understand the right tail behavior of a data set. It is known that if the 
underlying distribution of a data sample is in the domain of attraction of a Frechet, Gumbel or Weibull distributions 
then the ME plot of the data tend to a straight line in an appropriate sense, with positive, zero or negative slopes 
respectively. In this paper we construct confidence intervals around the ME plots which assist us in ascertaining 
which particular maximum domain of attraction the data set comes from. We recall weak limit results for the Frechet 
domain of attraction, already obtained in Das and Ghosh (2013) and derive weak limits for the Gumbel and Weibull 
domains in order to construct confidence bounds. We test our methods on both simulated and real data sets. 
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1. Introduction 

This article concerns the use of mean excess plots, a popular exploratory tool used to understand the tail 
behavior of a univariate data set. Given a sample of data points, one of the first things a sensible data analyst 
does is to compute a summary statistics. Such a summary statistic might involve calculating measures of 
central tendencies (mean, median, mode) and those of dispersion (standard deviation, range, etc), plotting a 
histogram, an empirical cumulative distribution function and so on and so forth. A more curious analyst would 
ask the question, does it even make sense to calculate the sample mean or standard deviation; would they 
represent their counterparts in the original population? What if the probability distribution of the population 
from which the data is sampled does not even have a first or second moment. This is a question that would or 
perhaps should particularly come to the mind of analysts modeling risk or other extreme events. In a world 
where data is being used to make serious economic, financial or environmental policy decisions, understanding 
extreme risks, which relate to the tail behavior of data sets have become increasingly important. This can 
be easily observed in the world of finance and insurance (Das, Embrechts and Fasen, 2013; Donnelly and 
Embrechts, 2010; McNeil, Frey and Embrechts, 2005), telecommunications (Maulik, Resnick and Rootzen, 
2002), environmental statistics (Davison and Smith, 1990) and many more areas. 

The mean excess(ME) plot is a graphical tool that is widely used to understand the tail behavior of a 
sample; see Embrechts, Kllippelberg and Mikosch (1997); Davison and Smith (1990). A Mean Excess plot, 
if the mean exists, assists in distinguishing light-tailed data sets from heavy-tailed ones. The inference is 
based on a visual examination of the slope of a fitted line through the ME plot (to be described in the next 
section) being zero, less than zero or greater than zero. Clearly, a confidence set around the fitted line would 
make inference in these cases more meaningful; hence this is the aim of the paper. 
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1.1. The ME plot 


The ME plot, as described in the introduction, is a popular tool in extreme value analysis. It is a simple 
graphical test to check whether data conform to a generalised Pareto distribution (GPD) . The class of GPD 
arise naturally in extreme value analysis as limit distributions while using the peaks-over-threshold (POT) 
method (Beirlant et ah, 2004; Davison and Smith, 1990). The cumulative distribution function of a GPD is: 


GM 


l-il+tx/P)- 1 '* if £ 7^ 0: 

1 — exp(— x/P) if £ = 0, 


( 1 . 1 ) 


where 0 > 0, and x > 0 when £ > 0 and 0 < x < —/3/£ if £ < 0. Parameters £ and 0 are referred to as 
the shape and the scale parameter respectively. In extreme value analysis, we are interested in the shape 
parameter £ which tells us whether the data is heavy-tailed (£ > 0) or light-tailed (£ < 0) or even more 
specifically if the underlying distribution has finite right end-point (£ < 0). The case £ > 0 and 0 = 1 
corresponds to the classical Pareto law with tail exponent l/£. 

The ME plot is an empirical graphical plot of the ME function of a random variable X ~ F which is 
defined as: 


M{u) :=E[X-u\X >u], (1.2) 

provided EA + < oo. The ME function is also known as the mean residual life function for non-negative 
random variables and is extensively used in reliability theory and survival analysis for data modelling since 
M(u) completely determines F if E(Y) < oo (Hall and Wellner, 1981). Suppose we have an iid sample 
X \,..., X n ~ F. A natural estimate of M ( u ) is the empirical ME function M (u) defined as 


M(u) = - 


I2i=l( x i - u )hXi>u\ 


E n 

i— 


1 I[Xi>u] 


u > 0. 


(1.3) 


Denoting X^ > X^ > ... > X(„) to be the order statistics from a sample A'i,..., X n . the ME plot is a 
plot of the points 

M£„ := {(X (fe) ,M(Y (fe) )) : 1 < k < n}. 

We study the asymptotic properties of A4£ n for different classes of distributions F. It is well-known that 
for a random variable X ~ Gg.g , we have E(X) < oo if and only if £ < 1 and in this case, the ME function 
of X is linear in u: 

M (u) = Y~^ + i^ u ’ ( L4 ) 

where 0<u<ooif0<£<l and 0 < u < —0/f, if f < 0. 

Interestingly, the linearity of the ME function characterises the GPD class (McNeil, Frey and Embrechts, 
2005; Embrechts, Kliippelberg and Mikosch, 1997). From the discussions in Ghosh and Resnick (2010) we have 
learnt that the empirical ME plot A4£ n above a high order statistics X( k \ when appropriately normalised 
converge in probability to a straight line if F is in the maximal domain of attraction of any generalized 
extreme value distribution with finite mean (Gurnbel, Weibull or Frechet distribution). Distributional limits 
for M.£ n in a space of closed sets and confidence intervals around M£ n can also be computed in many 
cases and such findings have been discussed in case the underlying data is heavy-tailed (Frechet domain of 
attraction) in Das and Ghosh (2013). 

In Section 1.2 we collect notations and ideas to be used throughout the paper. See Das and Ghosh (2013) 
for further elaboration of the concepts of convergence of closed sets (random) in this context. In the main 
part of the paper we start by consolidating a few results which are already known on the distributional 
property of ME plot, especially in the heavy-tailed case (Das and Ghosh, 2013); this is covered in Section 
2.1. The rest of Section 2 deals with limit results for ME plots in the case where the underlying distribution 
is either in the Gurnbel maximum domain of attraction or in a Weibull maximum domain of attraction. The 
limit theorems proved is Section 2 is used to create confidence bounds around the ME plots in Section 3. In 
Section 4 we use the tools developed in Section 2 and 3 to test it out both on simulated data as well as real 
data sets. 
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First we recall the idea of maximum domain of attraction of an extreme value distribution. The class of 
extreme value distributions is parametrized by a shape parameter £ £ M and we define the distribution 
function Gj to be 

G^(x) = exp(-(l + £.t) -1 ^), 1 + £x>0, 

for all real £ and for £ = 0, the right hand side is interpreted as exp(— e~ x ). 

Definition 1.1. A distribution function F (or the underlying random variable X ~ F) is in the maximum 
domain of attraction of an extreme value distribution G £ if there exists sequences c n > 0 and d n £ K. such 
that 

F n [c n x + d n ) —► G%(x) for all x £ 1ft. 


The distributions for the cases £ > 0, £ = 0 and £ < 0 are respectively called the Frechet distribution, the 
Gurnbel distribution and the Weibull distribution. As mentioned in the introduction, if F £ D(G^) for some 
extreme value distribution with £ < 1, implying that F has finite mean, then the ME function of F is linear 
with an appropriate slope determined by the parameter £; see Ghosh and Resnick (2010). 

Throughout this paper we will take k := k n to be a sequence increasing to infinity such that n/k n —» oo or 
k n /n —> 0. For a distribution function F(x) we write F(x) := 1 — F(i r) for the tail and the quantile function 
is 

b{u) := F^(l - := inf js : F(s) > 1 - ^ j = (j-^) (u). 

A function U : (0, oo) —► K + is regularly varying with index p £ K, written U £ RV P , if 


lim 

t—> OO 


U(tx ) 

w 


x > 0. 


If X ~ F we will often have the right-hand tail of F to be regularly varying, that is, F £ RV- a for a > 0, 
and by abuse of notation we might say X £ RV- a . Regular variation is discussed in several books such 
as Resnick (2007, 2008); Seneta (1976); Geluk and de Haan (1987); de Haan (1970); de Haan and Ferreira 
(2006); Bingham, Goldie and Teugels (1987). 

We use M + (0, oo] to denote the space of nonnegative Radon measures p on (0, oo] metrized by the vague 
metric. Point measures are written as a function of their points {xt,i = 1, ...,n} by See, for 

example, (Resnick, 2008, Chapter 3). 

We will use the following notations to denote different classes of functions: For 0 < a < b < oo, 

1. C[a, b): Continuous functions on [a, b). 

2. B[a, b): Right-continuous functions with finite left limits defined on [a, 6). 

3. B;[a, 6): Left-continuous functions with finite right limits defined on [a, 6). 

It is known that D[0,1] is complete and separable under a metric do(') which is equivalent to the Skorohod 
metric ds(-) (Billingsley, 1968, p.128), but not under the uniform metric || ■ ||. As we will see, the limit 
processes that appear in our analysis below are always continuous. We can check that if x is continuous (in 
fact uniformly continuous) in [0,1], for x n £ D[0,1], ||a; n — a;|| —>■ 0 is equivalent to ds{x n ,x) —► 0 and hence 
equivalent to do(x n ,x) -> 0 as n -> oo (Billingsley, 1968, p.124). So we use convergence in uniform metric, 
for our convenience henceforth. For spaces of the form D[a, b) or B; [a, b ) we will consider the topology of local 
uniform convergence. In some cases we will also consider product spaces of functions and then the topology 
will be the product topology. For example, B^[l, oo) will denote the class of 2-dimensional functions on [1, oo) 
which are left-continuous with right limit. The classes of functions defined on the sets [a, b] or (a, b] will have 
the obvious notation. For further details on notions of convergence and topology for convergences of plots 
see Das and Ghosh (2013). 
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2. Limit results for the ME Plots 


In this section we find distributional limits for ME plots when it exists. We continue the study of ME plots 
from Ghosh and Resnick (2010) and Das and Ghosh (2013) and give a complete picture of limit results for 
ME plots. We cite some of the results from the afore-mentioned papers for completeness. 

The basic assumption is that we have an iid sample of data points from some unknown distribution F 
which belongs to the maximum domain of attraction of one of the three extreme value distributions. The 
assumption of independence in the sample can be relaxed a bit under certain conditions which we do not 
explore here. 

Suppose X \,..., X n is an i.i.d. sample from a distribution F. We will work under this assumption for the 
entire section. The properties of the empirical ME function M(u) as an estimator of M(u) has been studied 
by Yang (1978). It was shown there that M(u ) is uniformly strongly consistent for M(u ): for any 0 < b < oo 


p 

lim sup 

M(u) — M{u) 

= 0 


_n->oo o<u<f) 


_ 


A weak (distributional) limit for M[u ) was also shown in Yang (1978): for any 0 < b < 1 

(M(**-(t)) - M(p*-(t))) =► U(t), 


where U(t) is a Gaussian process on [0,6] with covariance function 


I\s,f) 


(1 — t)a 2 (t) — td 2 (t) 

( 1-*)(1 - t ) 2 


for all 0 < s < t < b 


with 

a 2 (t) = var {XI [t< F(x)<i]) and 0(t) = E (XI [t<F(x) < p) . 

Using Lemma 2.4 in Das and Ghosh (2013) it is easy to see that the ME plots also exhibit the same features. 
Our interest in ME plots is for detecting right tail behavior of data samples (an equivalent case can be 
easily made for left tail behavior). Hence the linearity we seek in the ME plot will be for high thresholds. 
Necessarily, the ME plots we will discuss in the various cases will be transformations of the ME plot over an 
appropriate quantile, i.e., {(X^,M(X^)) : 1 < i < k} for k := k(n) < n where M is as defined in (1.3). 


2.1. ME plot in the Freehet case 

First we look at the case where the underlying distribution F is heavy-tailed, in the sense that F £ D(G^) 
with £ > 0 or in other words, F £ RV_i/£. We define the ME plot as: 

: * = 2 > ( 2 - 1 ) 

From (Ghosh and Resnick, 2010, Theorem 3.2), we know that for 0 < £ < 1, as n, k,n/k — > oo, 

M n 4 M := | (t, :i>lj 

The distributional behavior of A4„ depends on whether F has finite second moment or not and has been 
discussed under certain regularity conditions in Das and Ghosh (2013). We note them down below. 

Case 1 (0 < ^ < 1/2): For any 0 < e < 1 as n,k,n/k -£ oo, 
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( i \~ i Z 


: i = \ek],. ..,k 


/*r« 4 /*y 

\k) ’ X (k) 1 -£\k) 

MN := | (tS + ft~ { 1 + ^B(t), + ^ rl Jo y” (1+S) B (v) d y S j ,e < t < l| 


in J 7 , 


( 2 . 2 ) 


where B(t) is the standard Brownian bridge on [0,1]. This is the case where F has a finite second moment 
and hence the distributional limit has a Brownian component. 

Case 2 (1/2 < 4 < 1): For any 0 < e < 1, as n,k,n/k —> oo, 


MN n :=< 


r 


-Cn 


Vfc 


1 -Z\k) I 

j (X(i) /i\-£\ kb{n/k) ( M(X(j)) 4 /*\W N 


X, 


(*0 


/n _ 5 \ kbyn/k) / j 
UJ J ’ 6 (n) r 




(fe) 


1-4 


(*)' 


: * = \ek~\,... ,k 


^MM := | (f~* + 4 t" ( 1 + °S(t), + t-'SyA , e < t < l| in F, 


where Si/j is a stable random variable with characteristic function 


E\e itSi m =exp<; - 


1-4 


r(2- ^ cos 


24 1 


1 - i sgn (t) tan — 


(2.3) 


(2.4) 


and is independent of the standard Brownian Bridge B{t) on [0,1]. This is the case where F has a finite 
mean but does not have a finite second moment, hence we also observe a non-Gaussian stable weak limit. 
The results in (2.2) and (2.3) are described in Theorems 4.3 and 4.6 in details in Das and Ghosh (2013). 


2.2. ME plot in the Gumbel case 

The behaviour (in probability) of ME plot when F is in the maximum domain of attraction of a Gumbel 
distribution has been discussed in Ghosh and Resnick (2010). We state the following result to recall notations 
to be used: this follows from Theorems 3.3.26 and 3.4.13(b) in Embrechts, Kluppelberg and Mikosch (1997); 
see (Ghosh and Resnick, 2010, Theorem 3.9) or (Resnick, 2008, Proposition 1.4) for further details. 

Proposition 2.1. The following are equivalent for a distribution function F with right end point Xf < 00 : 

1. F is in the maximum domain of attraction of the Gumbel distribution, i.e., 

F n (c{n)x + d(n)) —> exp { — e~ x J forallxGR, (2.5) 

for some sequence c(n) and d(n). 

2. There exists z < Xf such that F has a representation 


F(x) = k(x) exp | — J —dfj, for all z < x < xf, 


( 2 . 6 ) 


where k(x) is a measurable function satisfying k( x) —>■ k > 0, x —> xf, and a(x) is a positive, absolutely 
continuous function with density a'{x) —► 0 as x Xp- 

We know from (Resnick, 2008, Proposition 1.1) that a choice of the norming sequence c(n) and d(n) in 
(2.5) is 

d(n) = F" i_ (l — n _1 ) and c(n) = a(d(n)). 















6 


B. Das and S. Ghosh 


Theorem 3.3.26 in Embrechts, Kliippelberg and Mikosch (1997) says that a choice of the auxiliary function 
a(x) in (2.6) is 


a(x) 



m 

F{x) 


dt 


for all x < xf, 


and for this choice, the auxiliary function is the ME function, i.e., a(x) = M(x). Furthermore, we also know 
that a'(x) —> 0 as x — > xf and this implies that M(u)/u — > 0 as u — > Xf- Define the ME plot in this case as 


M n := 


X 


ffc/el 




(fc) 


{ (x ( i) — X (fe) , M(X (i) )) 


: i = 2 ,..., k 


}• 


(2.7) 


From a minor modification of (Ghosh and Resnick, 2010, Theorem 3.10), we know that as n,k,n/k —> oo, 

p 

A4 n —► A4 := {(t, 1) : t > 0}. Now we will additionally put one more condition in order to get a weak limit 
for ME plots in the Gumbel case which is stated as follows. 


Assumption 2.2. The distribution function F satisfies the following: 

Vk ( c{n/k)y + d(n/k)) — e~ v ^ —> 0 (2.8) 

point-wise and in Li-norm in [0, oo) as n, k, n/k —» oo. 

Now we can state the distributional result for ME plots when F is in the maximum domain of attraction 
of the Gumbel distribution. 


Theorem 2.3. Suppose Xi,... ,X n are i.i.d. observations from a distribution F which is in the maximum 
domain of attraction of the Gumbel distribution and satisfies Assumption 2.2. Then for any 0 < e < 1, as 
n, k,n/k —> oo, 


MAf n :=< (-ln(f),l]+\//: 


-X’(i) - X (fc) 

X (\k/e-\) - X (k) 


In 


M{X (i) ) 

X (\k/e\) - X (k) 


— 1 I : i = \ek~\,... ,k 


=> MM := | |^-ln(t) + eB(e x ) ln(t) + ~p~, 1 + eR(e x ) + ^ j 
where B(t) is the standard Brownian bridge on [0,1]. 


1 f B(s) 


ds | , e < t < 1 


in T , 


Proof. The proof is along the same lines of the proof of Theorem 4.3 in Das and Ghosh (2013). Denote the 
tail empirical measures by 


((•) := j e .Yj -d( n /k) (•) and, 


i=l 


(n/k) 




£ x,-x 


i= 1 


(fc) 

c{n/k) 


(') 


(2.9) 

( 2 . 10 ) 


and define for k := k(n) < n and 0 < t < 1: 


W n (t) := Vk{v n (— In t, oo] — f) = Vk ( — (— lnt, oo] — t J . 


k ' c{n/k) 

1=1 


We prove in Lemma 2.4 that W n => W in Z?;(0,1], where W is the standard Brownian motion in [0,1]. 
Applying Vervaat’s lemma (Resnick, 2007, Proposition 3.3, p.59) to (2.2) we get 


Vk 


ex P i - 


x (\kf]) ~ d{n/k) 
c(n/k ) 


t, v n {— lnt, oo] — t ) => (— W(t), W(t)) inD^(0,1]. 
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Vk ^ + lnt > p n{y, oo] - e => (^p~,W(e v ) ) in D z (0,1] x B[l,oo), 


B(e ~ v ) in 0,(0,1] x D[l,oo). (2.12) 


and it is easy to check that 

^ ( A|r “(V*)* (tl + 1,1 *■ ^ 001 - e ~’) ^ 

Then following arguments used in the proof of Theorem 4.3 in Das and Ghosh (2013) we get 

.rz ( X im) ~d(n/k) M(X im) ) \ 

V M ^n/k) +lni ’ c(n/k) 1 


yi( Xm 'y,T /k) +^t, 

\ c ( n / k ) 

B{e-y)dy^j = ( 


(\kt\ — 1 )c(n/k) y x (rfcti)~ d(T ‘ /fc) 

c(n/k) 


u n (y,oo\dy - 1 


fm i 


B(t) 1 pBjs) 
t ’ t 


ds . 


\ t l J— In t / \ 0 u J 0 

The proof the theorem is completed by invoking Lemma 2.4 in Das and Ghosh (2013). 
Lemma 2.4. As n ^ oo, k —> oo, n/k —> oo, 


( 2 . 11 ) 


D 


W n =>W 

in Di{ 0,1] where W is a Brownian motion in Di( 0,1]. 


Proof. We check the conditions C1-C4 of (Rootzen, 2009, Theorem 2.1). In this part of the proof whenever 
we write between two expressions, it means the asymptotics hold for n,k,n/k —>■ oo. Now following the 
notations used in the aforementioned paper, we set r n = minj/c 1 / 4 , (n/k) 1 / 2 } and l n = 1. For any 
let 


N n (u,v ) := (u,v |. 

' -(»/<=) 


Then for any 9 < xt = oo (since F is in a Gumbel domain of attraction, it has right end point xt = oo) 
with 0 < u, v < 6 we have, 


P [N n (u,v) ^ 0] ~ r ra P[wc(n/fc) + d(n/k) < X\ < vc(n/k) + d(n/k)\ 


and 


E [N n (u,v) 2 \N n (u,v) ^ 0] 


E [N n (u, t>) 2 ] 

P[N n (u,v)^0} 

1 + r n P [u c(n/k) + d(n/k) < Xi < v c(n/k) + d(n/k )] 


< 1 + const.r n k/n 


which is bounded by the choice of r n . Hence condition Cl holds. Condition C2 holds as the random variables 
Xi are assumed to be independent. Next note that for any 0 < u,v < oo, 

1 Cov 
r n F(d(n/k)) 
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= nut n\\ Gov (u,oo],£ x 1 - d{n /k) (u, ool j 

F{d{n/k )) \ o (n/fc) cCn/fe) / 

F ((u V v)c(n/k ) + d{n/k)) 

F(d(n/k )) 

—► exp ( — u V i?). 

Hence C3 holds and obviously C4 holds because of the choice of r n . Hence, by (Rootzen, 2009, Theorem 2.1) 

(z/„ (u, oo] — E {u n {u, oo])) => G in Z)[0, oo), 

where G is a centered Gaussian process in [0, oo) with covariance function exp (—u V v ) and hence a time 
change — lnrt gives us that W n => W in A(0,1] where W is a standard Brownian Motion on [0,1]. 

□ 


2.3. ME plot in the Weibull case 

If F £ D(G^p), then we have the following characterizations for the case £ < 0 (Embrechts, Kliippelberg 
and Mikosch, 1997; Ghosh and Resnick, 2010): 

Proposition 2.5. If £ < 0 then the following are equivalent: 

1. F has a finite right end point xf and F(xf — £ _1 ) £ RV\/£. 

2. F n (xp + c(n)x) —► exp{—(— x)~ 1 ^} for all x < 0 where c(t) = Xp — F(1 — |), t > 1.. 

3. There exists a measurable function (3{u) such that 

lirn sup |F u (x) - Geai u )(x) I = 0. 

U^-Xf u<X<Xf 

Recall from Ghosh and Resnick (2010), the following result on ME plots (there is a typographical error 
in the statement of the result there): 


Proposition 2.6. If X\,... ,X n are i.i.d. observations with distribution F which has a finite right end 
point xp and satisfies 1 — F(xp — x~ x ) € FVW as x —► oo, then in F, 


M n := 


X G) - X {k) 


{ [X {i) - X {k) ,M{X w )) : i = 2,..., fc} A .M := { (i, ^ (1 - t) ) : 0 < t < 1 


}■ 


(2.13) 


In this paper we obtain the weak limit of the ME plot when the null hypothesis that F{xf — x~ x ) £ FVi/^ 
for some £ < 0 holds. In the same spirit as Das and Ghosh (2013) we deal with the tail empirical process. 
Denote by u n : 



Following Theorem 4.2 in Resnick (2007), we can show that 

v n ^v in M + [0, oo) 

where v[0 1 x) = x ~ 1 ^, x > 0. Now define for k := k(n) < n and y > O'. 

( i n „ _ 

W n (y) : = Vk - [0, y ~ ? ) - —F (x F - c{n/k)y ~ 4 ) 

\ k ' C (n/fc) k v 7 

\ 2=1 

= Vk (u n [0,y~Z) - E (lyjO,?/ -5 ))) . 

The next result in the spirit of (Resnick, 2007, Theorem 9.1) and also similar to Lemma 2.4. 


(2.14) 


(2.15) 


(2.16) 
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W n ^W 

in D[0, oo) where W is a Brownian motion in D[0,oo). 


The proof follows by going through the steps of the proof of Lemma 2.4 or (Resnick, 2007, Theorem 9.1). 
Let us also assume the following: 


Assumption 2.8. F satisfies the following 


[ 1 ] 

[ 2 ] 


Vk (J-^F (xf ~ c(n/k)y) — y t 0 for all y > 0, 

Vkf -j^F (xf — c{n/k)y) — y 


_ 


dy —► 0. 


as n, k, n/k —> oo. 


Theorem 2.9. Suppose A”i,... ,X n are i.i.d. observations from a distribution F which has a finite right 
end point Xf and satisfies 1 — F(xf — a; -1 ) g R\ i/g,£ <0 os x —> oo and Assumption 2.8 holds. Then for 
any 0 < e < 1, as n,k,n/k —>• oo, 


MN n := 




C-l 



v© 


X (l) - A, 


(fc) 


- 1- v 


(s) 


-« 


M(A (0 ) 

’*(!)-*(*) 




I© 


-f 


: i = refel,...,fc 


MAf := 


x (i)~ x (k) 

| ^1 - t _c + ^ _(1+4) R(t), + ^ _1 J V~ {1+i) B(y)dy S j , e < t < l| in J\ 


where B(t) is the standard Brownian bridge on [0,1] restricted to (0,1]. 


Remark 2 . 10 . This result is similar to the one obtained for F £ RV_i/ ^ or £ > 0 in Theorem f.3 of Das 
and Ghosh (2013); the subtle difference appears in the fact that we no longer need to restrict the range of £ 
as is done there with 0 < £ < 1/2, since the integral 

f y~ {1+i) B(y)dy = [ y~ [1+0 W{y)dy - W(l) [ y~^dy 
J 0 Jo Jo 

exists if and only if /J s~ 2 ^ds < oo which is always true for £ < 0 and in turn implies that the limit A4Af 
exists. The truncation with e with e < t < 1 is still necessary to guarantee that the limit set A4Af does not 
blow up for t near 0. 


Proof. The proof is omitted here as it follows using similar arguments as in the proof of (Das and Ghosh, 
2013, Theorem 4.3). The difference occurs in the fact that we use the weak convergence result mentioned in 
Lemma 2.7 as our basis and apply a proper version of Vervaat’s Lemma and ‘converging together’ arguments 
on this to obtain the result. □ 
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3. Creating confidence bounds from the limit results 

In Section 2 we obtain weak limits for ME plots for different values of £ £ R. where the underlying distribution 
F £ D(G^). Now, depending on varying values of £ we construct the different confidence bounds following 
the results. We resort to Monte Carlo simulation for actually computing the limits since most of them require 
calculating quantiles of suprema of functionals of Brownian bridges over a finite interval or quantiles of stable 
distributions. 

We need to truncate the ME plot near infinity in all the cases since the weak limits we obtain blow up 
there (it relates to t near 0 in the limit of Ad Af n ). 

3.1. Frechet case: 

This case has already been discussed in Das and Ghosh (2013) and we recall it here for the sake of com¬ 
pleteness. Define the truncated versions of A4 n defined in (2.1) and its limit A4 respectively for 0 < e < 1 
as: 


Mi := 


X, 


(k) 


: i = \ke\ ,..., and 


M e := 


{(< 


-1 


* G 1 


’l-£ 


) : e < t < lj. (3.1) 


Then Mi 4 M e . 

Case 1 (0 < ^ < 1/2): From (2.2), we have that the (1 — a)100% confidence band for M e as 


CM e n := M e n + < (x, y) : x £ - 


*/ 2 , 


a./Z,e '-'a. Z,e 


/2,c 


Vk ’ Vk 


,2/e - 


V2, 


a. Z.e ' m ol/Z,e 


/2,« 


Vk ’ Vk 


where 


c a ,e = (1 — a)-th quantile of sup ^ 1+? ^13(t), 

e<t< 1 


d a , e = (1 — a)-th quantile of sup / y~^ 1+ ®B{y)dy. 

e<t< 1 J 


(3.2) 


(3.3) 

(3.4) 


Since the weak limit of properly scaled and shifted Mi consists of functionals of the same Brownian 
Bridge in both components, (3.2) provides an asymptotic confidence bound around M e with P(M e C 
CM e n ) > (1 — a) for large n. 

Case 2 (1/2 < £ < 1): From (2.3), we have the (1 — a)100% confidence band for M e as 


CMi = 


x (\kt\) MpfcffctpA 
X(k) ’ 4 ^) / 


where 


f C ol i/2,e C ai /2, e \ /X( 1 )d 1 _ a2 / 2 Xil)d a2 / 2 \ ^ 

I Vk ’ Vk J \ \kt\x (h) ' \kt]X {k) J ■ e - - 


(3.5) 


d a = (1 — a)-th quantile of S i/j defined in (2.4). 

Here 0 < oq, «2 < 1 are chosen such that (1 — a) = (1 — aq)(l — a^)- Since the random components in the 
first and second components in the limit of (2.3) are independent this gives us the right confidence interval 
so that P(M e C CMi) > 1 — a. The above quantiles are calculated using Monte Carlo simulation methods. 
In real data examples £ is estimated using a Hill estimator, or any reasonable estimator for the tail index of 
a heavy-tailed distribution. 
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This is the case where F £ D(G/). Many well-known distribution functions such as exponential, normal, 
log-normal distributions fall into this class. First we define the truncated versions of M„ defined in (2.7) 
and its limit At respectively for 0 < e < 1 as: 


Mt:= 


1 


X(\ k/e -\)-X (k) 


{( x (i) - X(k),M(X(i))) : i = ffcel,...,fc} and M e := {(- ln(t), 1) : e < t 


Then At e n —> A4 e . Using Theorem 2.3, we have that the (1 — a)100% confidence band for Ai e as 


CM e n := M e n + { (x,y) : x e - 


Cot/ 2,e C a j 2 ? g 
y/k ’ y/k 


,2/e - 


^a/2,e C'ot/ 2,e 


y/k 1 y/k 


where 


Ca,e = (1 — a)-th quantile of sup < eB{e i ) ln(f) + 


e<t<l 


B{t)\ 


t I 


d a e = (1 — a)-th quantile of sup < eB(e 1 ) + - [ ^^-dy > . 

e <t<i t J y \ 


<!}• 

(3.6) 


(3.7) 


(3.8) 

(3.9) 


By the same logic, as the earlier cases, (3.7) provides an asymptotic confidence bound around M e with 
P(M e C CM/) > (1 — a) for large n. The quantiles are obtained using Monte Carlo simulation. 


3.3. Weibull case 


In this case F £ Z?(G{) with £ < 0. Many distributions, especially with bounded right hand-tail falls into 
this category, for example Uniform, Beta, etc. Here we define the truncated versions 
Proposition 2.6 and its limit M respectively for 0 < e < 1 as: 

1 ( .... ((. £ , 


Mi := 


X (U “ X (k) 


CM e n := M e n + |(x,y) : x £ ^ 


G - 


lot/ 2,e C^ot/ 2,e 

y/k ’ y/k 



as defined in 

*>) 

: e < t < l| . 


(3.10) 

M e 

as 

7 

(3.11) 


where 


c a ,e = (1 — a)-th quantile of sup £t (1+ ^.B(f), 

e<t< 1 


Z 

d a e = (1 — ct)-th quantile of sup / y _ U+0 B{y)dy. 

e<t< 1 J 


(3.12) 

(3.13) 


The bounds obtained here are very similar to the one in the Frechet case. And using the same argument, 
(3.11) provides an asymptotic confidence bound around M e with P(AF C CM/) > (1 — a) for large n. 
Similar to the previous cases, the quantiles are obtained using Monte Carlo simulation. 















12 


B. Das and S. Ghosh 


4 . Examples: Simulated and real data 


This section is devoted to application of the methodology developed for constructing confidence intervals 
around ME plots as derived in the Section 3. 



Figure 1. ME Plot for 10000 i.i.d. Generalized Pareto random variables with £ = —0.5 and ( 3 = 1 . 


Given an iid sample X -\,..., X n ~ F, we are concerned with detecting if F £ D(G%) and if so whether £ 
is positive (the Frechet case), zero (the Gumbel case) or negative (the Weibull case). The Frechet case has 
been discussed in Das and Ghosh (2013) with examples. Hence we concern ourselves with the other two cases 
for the simulated examples. First we see how our confidence intervals work in simulated examples, and then 
use them on real data. In all the plots below, the light blue shade creates a 95% confidence interval and the 
dark blue shade creates 90% confidence interval. 
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Figure 2. ME Plot for 10000 i.i.d. Beta(a,ft) random variables with a = 2,/3 = 2. 


4.1. Simulated examples: Weibull 

In this case we have F £ D(G^) with £ < 0. We consider two families of distributions here. 

1. Consider F ~ GPD(£,/3) with pdf given by 

F(x) = l- + /? , l+j >0, /?>0,£<0. 

Of course, F £ Z?(G{) here and F is in the Weibull domain of attraction if £ < 0. In fact the Uniform 
(0,1) falls into this class with £ = — 1 and /3 = 1. 

For our simulation example we take £ = —0.5, /3 = 1 and generate 10000 iid samples from the distribu¬ 
tion. The two plots in the left of Figure 1 are Pickands and Moment estimate of £ for increasing values 
of top order statistics used. They seem reasonable close to —0.5. For k — 800,1000 and 6 = 0.2, 0.3 we 
create confidence bounds around the ME plot (in black) which clearly covers the dashed red line with 
slope —0.5. 

2. Next consider F ~ Beta(a, b) with pdf given by 

/O) = ~ x ) b ~ 1 ’ °< a; < 1 > a,b > 0. 

r(a)r(6) 

In this case F £ H(G_i/{,). We take the example a = 2, b = 2 where F £ D(G- 0 . 5 ). As observed in the 
previous example we see that the Pickands and Moment estimates approximate —0.5 well; see 2. We 
again create confidence bounds with k = 800,1000 and S = 0 . 2 , 0.3 and observe that the bounds cover 
the dashed red line with slope —0.5. 

Thus the detection in the Weibull family looks reasonable. 

4.2. Simulated examples: Gumbel 

Distributions in the Gumbel domain of attraction are harder to detect since a data sample has to form a 
plot with slope zero in this case, which is statistically unlikely. Hence confidence bounds help to an extent, 
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although as we will see through the three examples below that, in practice, a plotting technique is helpful 
to different degrees in different cases. 



Figure 3. ME Plot for 10000 i.i.d. Exponential random variables with A = 1. 



Figure 4. ME Plot for 10000 i.i.d. Standard Normal random variables. 


1. The first example is where F follows Exp{ 1). We generate 10000 iid samples from the distribution 
and create ME plots with parameters k = 800,1000 and S = 0.2,0.3; see Figure 3. The Pickands and 
Moment estimates are close to zero and the confidence intervals around the ME plot in the four different 
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cases all cover the line with slope 0 (and intercept 1) as expected. So here the detection techniques 
works well. 

2. The next data set we look at is a sample generated from F which follows N( 0,1). We again generate 
10000 iid samples from the distribution and make ME plots with parameters k = 800,1000 and 5 = 
0.2, 0.3; see Figure 4. In this case the Pickands estimate is close to zero but the Moment estimate 
though close zero seem to be an underestimate. The confidence intervals around the ME plot in the 
four different cases all cover the dashed red line with slope 0 (and intercept 1) up to some point and 
then it doesn’t. We can believe that F £ D(Gq) but the case becomes less convincing than the previous 
example. 






ME plot: k= 1000 , delta= 0.2 (Gumbel) 



ME plot: k= 1000 , delta= 0.3 (Gumbel) 



ME plot: k= 800 , delta= 0.2 (Gumbel) 



ME plot: k= 800 , delta= 0.3 (Gumbel) 



Figure 5. ME Plot for 10000 i.i.d. Standard log-normal random variables. 


3. Finally we look into F which is a standard Lognormal distribution. It is known the a Lognormal 
distribution belongs to D(Gq), but on the other hand we know that it has no finite moments (unlike 
the Normal or Exponential case). So it is on the one-hand sub-exponential or heavy-tailed although 
belong to a Gumbel domain of attraction. 

We simulate 10000 iid samples from a standard Lognomal distribution and create ME plots as in the 
previous cases. The results are in Figure 5. Both the Pickands estimate and the moment estimate of 
the extreme value parameter are much higher than the true value, that is zero. The ME plot with 
confidence intervals around it miss the target red dashed line of slope zero (and intercept 1); a larger 
choice of S would make the confidence intervals large enough to cover the line, but clearly our technique 
doesn’t seem to perform so well here. Since the Lognormal distribution has heavy tails we tend to have 
a positive slope of the ME plot as would happen in case when F is in the Frechet domain of attraction. 
Hence overall for detecting a Gumbel domain of attraction family we need to be more careful with this 
technique. 

4.3. Observed data: Ozone concentration at Zurich urban area 

It is of interest for environmental scientists to study ozone concentration near urban conglomerations, as 
its presence in the atmosphere implies health risks related to respiratory diseases. Directive 2008/50/EC of 
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the European Parliament puts the target value of ozone for its member states to be within 120 fig/m 3 . The 
directive says that as of January 1, 2010 ozone concentrations should not exceed this limit for more than 25 
days in a calendar year where the daily calculation is based on maximum of daily 8-hour averages. 


Daily Maximum Ozone content in Zurich Urban Area 


Daily Maximum Ozone (homoscedasticized) 




ME plot: k= 658 , delta= 0.Z (Weibull par: 0.14) 


ME plot: k= 658 , delta= 0.3 (Weibull par: 0.14 ) 


ME plot: k= 526 , delta= 0.2 (Weibull par: 0.11 ) 


ME plot: k= 526 , delta= 0.3 (Weibull par: 0.11 ) 



Figure 6. ME Plot for Ozone concentration in Zurich urban area. 


We study a data set, freely available from www.eea.europa.eu. The data set contains daily maxima of 
ozone concentration (in gg/m 3 ) from one station in Zurich, Switzerland (station code CH 0010A, Zurich- 
Kaserne) located 410 mts above sea-level. Data is observed from January 1, 1992 to December 31, 2009. 
Measurements were unavailable for 22 days, which we impute by the average value of ozone concentration 
on the same day for other available years. 

As seen in the top left plot in Figure 6 the data clearly admits periodicity. Moreover it is likely that the 
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data is heteroscedastic. So we homoscedasticize the data by dividing the value on each date by the standard 
deviation of the values on the same day over all the 18 years of data available. Since our techniques work 
for stationary data sets, we fit an AR (38) process to the data set (AR(38) is chosen by an AIC criterion) 
and observe (from the ACF; see second plot from the second line in Figure 6) that the residuals (first plot 
in the second line) look independent. Now we analyze the extremal behavior of the residuals of the model. 
The Pickands and Moments estimates give a negative value but close to 0 and we can hypothesise that the 
sample is from a Weibull domain of attraction family. But, since the value of the parameter is close to 0 we 
also check whether the data is possibly from a Gumbel domain of attraction family. The confidence bounds 
(90% deep blue and 95% light blue) are created assuming F £ I?(Go)for k = 329 and k = 658; which are 5% 
and 10% of the data set and withJ = 0.2, 0.3. Observe that the 90% bounds tend to reject the hypothesis of 
the underlying F £ D(Gq) and the 95% do not. This is most likely a result of the parameter being close to 
zero. 

On the other hand using the Pickands estimate to estimate the tail index we get £ = —0.14 (for k = 658) 
and £ = —0.11 (for k = 329) and the confidence bounds (again 90% deep blue and 95% light blue) for 
S = 0.2,0.3 covers the straight line with the slopes £ quite well. Hence we are expect that the underlying 
distribution is in fact in a Weibull domain off attraction with parameter close to £ = 0.1. 


4.4. Observed data: Flow-rates at river Aare 


The other data we analyze is maximum daily flow-rate at river Aare. River Aare flows through Switzerland 
and some manufacturing and power plants located near the river are often concerned about flooding on the 
river. The data we analyse has been collected from the Federal office of the Environment (FOEN), Bern 
and generously provided to us by Kernkraftwerk Gosgen-Daniken. It pertains to daily maximum flow-rates 
of Aare at the measurement station Aare-Murgenthal (2063) measured in m 3 /sec from 1st January 1974 to 
20th October, 2010. See also www.hydrodaten. admin. ch/d/2063.htm. 

Note that the data admits to possibilities of measurement error since automated measurement at the 
specific station started only in 1993. Moreover, the control authorities aim to maintain the flow-rate of Aare 
at the Aare-Murgenthal (2063) station below 850 m 3 /sec and would do so by using opening or closing log- 
gates. This manually hinders the possibility of the data set being tuitionary. We were informed that such 
manual intervention has been done a couple of times. 

To analyse the data, we first note the seasonality pattern in the data set; see top left plot in Figure 
7. Hence as in the previous example we fit an AR process and work with the residuals obtained after the 
model-fitting. Observe that the Pickands and Moment estimates both indicate towards a small but positive 
value of the extreme value parameter; but does not completely reject the possibility of it being zero. We 
again create 90% (dark blue) and 95% (light blue) confidence bounds under the Gumbel assumption for 
k = 673 and k = 1345 (again 5% and 10%) of the sample size and 5 = 0.1, 0.2. The detection technique 
seems to reject that the underlying distribution F £ D(Gq). 

Now we allow a Pickands estimate to chose the extreme value parameter which gives a value of around 
£ = 0.16 (for k = 673 and k = 1345) and the confidence bounds seem to support that the data is from a 
distribution in the Frechet domain of attraction. Thus we may conclude that flow-rate data at Murgenthal 
station is perhaps slightly heavy-tailed even if marginally so. 


18 


B. Das and S. Ghosh 



Figure 7. ME Plot for Aare river flow data. 
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